A rigorous geometry-probability equivalence in characterization of 

t\ -optimization 



MlHAILO STOJNIC 



X 



School of Industrial Engineering 



m 
o 

(N| ■ Purdue University, West Lafayette, IN 47907 
5— i , e-mail: msto jnic@purdue . edu 

Q\ . Abstract 
<N 

In this paper we consider under-determined systems of linear equations that have sparse solutions. This 
subject attracted enormous amount of interest in recent years primarily due to influential works |]8][T6]. In a 
statistical context it was rigorously established for the first time in JUIHl that if the number of equations is 
smaller than but still linearly proportional to the number of unknowns then a sparse vector of sparsity also 
linearly proportional to the number of unknowns can be recovered through a polynomial l\ -optimization 
algorithm (of course, this assuming that such a sparse solution vector exists). Moreover, the geometric 
\ approach of [ 16 ] produced the exact values for the proportionalities in question. In our recent work P3l we 

introduced an alternative statistical approach that produced attainable values of the proportionalities. Those 
happened to be in an excellent numerical agreement with the ones of [ 16]. In this paper we give a rigorous 
f-^. \ analytical confirmation that the results of [43 1 indeed match those from |[T6l . 

cn 

^ | Index Terms: Linear systems; Neighborly polytopes; i\ -optimization . 



1 Introduction 



The main concern of this paper is an analytical study of under-determined systems of linear equations that 
have sparse solutions. To that end, let us assume that there is a fc-sparse n dimensional vector x such that 

y = A* (l) 

for an m x n (m < n) statistical matrix A and anmxl vector y (see Figure [T] here and in the rest of the 
paper, under A;-sparse vector we assume a vector that has at most k nonzero components; also, in the rest of 
the paper we will assume the so-called linear regime, i.e. we will assume that k = fin and that the number 
of the equations is m = an where a and (3 are constants independent of n (more on the non-linear regime, 
i.e. on the regime when m is larger than linearly proportional to k can be found in e.g. 1112112411251 ). We then 
look at the inverse problem: given the A and y from dU can one then recover the /c-sparse x in (Q~|). 

There are of course many ways how one can attempt to recover the A;-sparse x. If one has the freedom to 
design A in parallel with designing the recovery algorithm then the results from [3 ,32. 36 1 demonstrated that 
the techniques from coding theory (based on the coding/decoding of Reed-Solomon codes) can be employed 
to determine any /c-sparse x in (fl]) for any < a < 1 and any /3 < ^ in polynomial time. It is relatively easy 
to show that under the unique recoverability assumption (3 can not be greater than ^. Therefore, as long as 
one is concerned with the unique recovery of fc-sparse x in (Q]) in polynomial time the results from [3 3211361 
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Figure 1 : Model of a linear system; vector x is /c-sparse 



are optimal. The complexity of algorithms from []3]|32][36] is roughly 0(n 3 ). In a similar fashion one can, 
instead of using coding/decoding techniques associated with Reed/Solomon codes, design matrix A and the 
corresponding recovery algorithm based on the techniques related to the coding/decoding of Expander codes 
(see e.g. H301l3Ul47l and references therein). In that case recovering x in (Q} is significantly faster for large 
dimensions n. Namely, the complexity of the techniques from e.g. [30 3TM71 (or their slight modifications) 
is usually 0(n) which is clearly for large n significantly smaller than 0(n 3 ). However, the techniques based 
on coding/decoding of Expander codes usually do not allow for /3 to be as large as tt. 

If one has no freedom in the choice of the matrix A (instead the matrix A is rather given to us) then the 
recovery problem (H) becomes NP-hard. The following two algorithms (and their different variations) are 
then of special interest (and certainly have been the subject of an extensive research in recent years): 

1. Orthogonal matching pursuit - OMP 

2. Basis pursuit - 1\- optimization. 

Under certain probabilistic assumptions on the elements of A it can be shown (see e.g. Il35ll44ll45l ) that 
if m = 0(fclog(n)) OMP (or slightly modified OMP) can recover x in £T|) with complexity of recovery 
0(n 2 ). On the other hand a stage- wise OMP from [22] recovers x in (Q~|) with complexity of recovery 
O(nlogn). Somewhere in between OMP and BP are recent improvements CoSAMP (see e.g. IT3411 ) and 
Subspace pursuit (see e.g. |[T3l ). which guarantee (assuming the linear regime) that the fc-sparse x in (OQ) can 
be recovered in polynomial time with m = O(k) equations. 

In this paper we will focus on the second of the two above mentioned algorithms, i.e. we will focus on 
the performance of t\ -optimization. (Variations of the standard l\ -optimization from e.g. |[T0l[TTll40l ) as 
well as those from |[T^l23ll26T - l28l[39l related to ^-optimization, < q < 1 are possible as well.) Basic 
l\ -optimization algorithm finds x in (Q~|) by solving the following £i-norm minimization problem 

min ||x||i 

subject to Ax. = y. (2) 

In seminal work JH, it was established that for any constant a G (0,1) and m = an there is a constant 
/3 G (0, a) and k = /3n such that the solution of © is with overwhelming probability the A;-sparse x in 
CO) (moreover, this remains true for any A:-sparse x). (Under overwhelming probability we in this paper 
assume a probability that is no more than a number exponentially decaying in n away from 1.) The results 
of El rested on having matrix A satisfy the restricted isometry property (RIP) which is only a sufficient 
condition for l\ -optimization to produce the solution of (fl]) (more on RIP and its importance can be found 
in e.g. fflgl0|9l|38l). 
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Instead of characterizing the m x n matrix A through the RIP condition, in 111511161 Donoho associates 
certain polytope with the matrix A. Namely, 111511161 consider polytope obtained by projecting the regular 
n-dimensional cross-polytope by A. It turns out that a necessary and sufficient condition for © to produce 
the fc-sparse solution of © is that this polytope associated with the matrix A is ^-neighborly lTT5l[T6][T8l 
[T9l . Using the results of fl2j|6]|33]|37j|4l]|46], it is further shown in [16], that if A is a random m x n 
ortho-projector matrix then with overwhelming probability polytope obtained projecting the standard n- 
dimensional cross-polytope by A is ^-neighborly. The precise relation between m and k in order for this to 
happen is characterized in H15II161 as well. 

It should be noted that one usually considers success of © in recovering any given A;-sparse x in ©. 
It is also of interest to consider success of © in recovering almost any given x in ©. We below make a 
distinction between these cases and recall on some of the definitions from H16lll81l20ll2Tll42]|43li . 

Clearly, for any given constant a < 1 there is a maximum allowable value of /3 such that for any given 
A>sparse x in © the solution of © is exactly that given A;-sparse x with overwhelming probability. We 
will refer to this maximum allowable value of [3 as the strong threshold (see lfl6l ). Similarly, for any given 
constant a < 1 and any given x with a given fixed location of non-zero components and a given fixed 
combination of its elements signs there will be a maximum allowable value of f3 such that © finds that 
given x in © with overwhelming probability. We will refer to this maximum allowable value of (3 as the 
weak threshold and will denote it by f3 w (see, e.g. 114211431 ). In this paper we will provide a rigorous proof 
that [3 W one can determine through Donoho 's framework from 11161 is exactly the same as f3 w determined 
in ED. 

We organize the rest of the paper in the following way. In Section [2] we will first recall on the basic 
ingredients of the analysis done in |[T6l . Using the insights from P3l we will then give a closed formula for 
[5 W computed in ifToll . As hinted above, this formula will match the one computed in B3l . In Section [3] we 
will then specialize the results from Section [2]to the case when the nonzero components of sparse vector x 
in © are positive (or in general with a priori known signs). Using again the insights from [43 1 we will then 
give a closed formula for (3 W computed for this case in ||T8l . This formula will match the corresponding one 
computed in [43]. Finally, in Section|4]we discuss obtained results. 

2 General x 

2.1 Success of l\ and neighborliness of projected cross-polytope 

In this section we show that the weak thresholds obtained in 11431 are the same as the ones obtained in |[T6l . 
To that end, we start by recalling on the basics of the analysis from |[T5l[l6l . In his, now legendary, paper 
lTT5l Donoho took a geometric approach to the performance analysis of l\ -optimization and managed to 
connect the performance analysis of t\ -optimization to the concepts of polytope's neighborliness. The main 
recognition went along the following lines: 1) Let &} be the regular n-dimensional cross-polytope and let 
the ACp be the polytope one obtains after projecting C" by A; 2) Then the solution of © will be exactly 
the fc-sparse solution of © if and only if polytope ACp is centrally ^-neighborly (more on the definitions, 
importance, and many incredible properties of neighborliness can be found in e.g. Ifl6ll29l ). Here we just 
briefly recall on the basic definitions of neighborliness and central-neighborliness from lfl6l . Namely, a 
polytope is ^-neighborly if its every k + 1 vertices span its a k dimensional face. On the other hand a 
polytope is centrally /c-neighborly if its every k + 1 vertices that do not include any antipodal pair span its a 
k dimensional face. 

The above characterization then enables one to replace studying the success of l\ -optimization in solving 
an under-determined system by studying the neighborliness of projected cross-polytopes. Of course, a priori, 
it is not really clear that the latter problem is any easier then the former one. However, it turns out that it 
has been explored to some extent in the literature on the geometry of random high-dimensional polytopes. 
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Using the "sum of angles" result from [2 | (which at its core relies on O3U41I0 it was established in [ 16] that 
if A is a random ortho-projector AC™ will be centrally ^-neighborly with overwhelming probability if 

n- 1 log(C com C int (T k , T m )C ext (F m , C™)) < (3) 

where C com = 2 m ~ k { n ^ k J k l ) , C int {T k ,T m ) is the internal angle at face T k of T m , C ext (F m , C£) is the 
external angle of C™ at any m-dimensional face F m , and T k and T m are the standard k and m dimensional 
simplices, respectively (more on the definitions and meaning of the internal and external angles can be found 
in e.g. [29J). Donoho then proceeded by establishing that ([3]) is equivalent to the following inequality related 
to the sum/difference of the exponents of C corn , Ci n t, and C ex t\ 

^ 'net ^ com ^int ^ ext ^ (4) 

where 

* com = n- 1 log(C com ) = (a-^)log(2) + (l-^) J ff(^|) 
* mt = n- 1 log(C int (T k ,T m )) 

V ext = n- l \og{C ext {F m ,C;)) (5) 

and H(p) = —plog(p) — (1 — p) log(l — p) is the standard entropy function and log Q^) = e nH ^ is the 
standard approximation of the binomial factor by the entropy function in the limit of n — > oo. The rest of 
the Donoho's approach is the analysis of the closed form expressions for Ci n t(T k , T m ) and C ex t(F m , C™ ) 
obtained/analyzed in various forms in ||6]|37j|38]. In the following two subsections we will separately con- 
sider results Donoho established for the internal and the external angle exponents. Relying on the insights 
from [43 ] we will provide neat characterizations of the exponents that will eventually help us establish the 
equivalence of results from [ 16 ] and B31 . 



2.2 Internal angle 

Starting from the explicit formulas for internal angles given in (6] Donoho in [16 1 through a saddle-point 
integral computation established the following procedure for determining the exponent of the internal angle 
^ int . Let 7 = | and for s > 

1 f°° 
&(s) = —== I e 2 dx 
y/2x 



Then one has 



where 



1 s 2 

e". (6) 



2?r 



*mt(A a) = (a - /3)£ 7 (y 7 ) + (a - P) log(2) (7) 



- 1 
y 7 - 1 _^s J 



£>(y 7 ) = -\vt-~ Jlog(-) + log(^) (8) 
2 ' 7 2 7r 7 
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and s 7 > is the solution of 



«(«) = (l-7)— ■ (9) 
s 

Now, if one can determine s 7 then a combination of © and ([8]) would give a convenient closed form 
expression for the exponent ^mt(/3, a). Finding s 7 amounts to nothing but solving © over s which for an 



unknown 7 could be incredibly hard. At this point we will make a "bold" guess and say that t = and 

s 7 = \/2erfinv(t) = V2erfinv(i^) (10) 

1 p 

where erfmv(-) is the inverse of the error function erf (•) associated with the standard normal random variable 
(erf (r) = -^L e~ g dq). Of course, it is rather hard to believe that s 7 from (fTOl will be the solution of © 

for every 7 = £. However, what we hope is that it may be the solution of © for the optimal 7 = Sm., i.e for 
the one for which the net exponent, ^ ne t, in © is zero (strictly speaking, instead of "zero" we should say 
"smaller than — e where e > is arbitrarily small"; in an effort to make writing and main ideas clearer we 
will throughout the rest of the paper almost always ignore e's). Even a hope like this is fairly out of the blue 
and it would require an enormous amount of intuition for one to come up with a hopeful guess like the one 
from (fTOl just by staring at equations (09) and not knowing the results of ||43l . 

Now what is left to do is to confirm that our guess is actually right. We start by noting that $(s) = 
|(1 — erf(^=)). If ( fTOl is to be correct then to satisfy © one must have 

1 6(s) 1 e -( errlnv (*)) 2 



2 V s ' \/2^ V2erfmv(£) 

or in a more convenient algebraic form 

1-0 ^(erfmv(i5f)) 2 



a V vr ^/2erfinv(ief) 



1. (11) 



If it eventually turns out that for a and /3 for which (fTTT > holds one also has that ^f ne t in © is zero then we 
could claim that the guess we made for s 7 in (fTOl is actually correct. 

We now proceed with the evaluation of the "internal exponent" ^i n t assuming that both (fTOl and (fTTT) 
are correct. Plugging (fTOl back in © we obtain 

7 j3 /— 1 — a 

y 7 = s 7 = -V2erfinv(- -). (12) 

1 — 7 a — p 1 — p 

Combining ® and (fT2l) further we have 



■^^(V2erfinv(^)) 2 - \ log(-) + log(^-) + log(V2erfinv(^)). (13) 
2 a — p 1 — p 2 7r a — p 1 — p 
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Finally plugging £ 7 (y 7 ) computed in (fT3T ) back in (O we have for the exponent of the internal angle 

%nt = -^(^erfinv(i^)) 2 - ^Z± l og (l) + («_/?) log(«) 

- (a - 0) log(a -/?) + («-/?) log(^ernnv(^-^)) + (a - /3) log(2). (14) 

2.3 External angle 

In this subsection we provide results for the external angle that are analogous to those provided in the 
previous subsection for the internal angle. In lfl6l Donoho established that the exponent of the external 
angle can be computed in the following way 

= min(m/ 2 - (1 - a) log(erf (y))). (15) 

It was further shown in [16| that function (ay 2 — (I — a) log(erf(y))) is smooth and convex. If one could 
solve the above minimization analytically then there would be a neat expression for the exponent of the 
external angle. As in the previous section, solving this minimization does not appear as an easy task for 
any fixed a (/?). However, we will again take a "bold" guess and assume that the solution of the above 
minimization is 

1 — a 

y ext = erfinv(y— g). (16) 

It is of course unreasonable to expect that this choice of y would be the solution of the minimization problem 
in (fT5T > for every given a. However, we do hope that it could be the solution for the optimal pair (a, (3) (as 
stated above, the optimal pair (a, /?) is the one that makes the net exponent ^ ne t in (@]) equal to zero). If 
y ex t defined above is to be the solution of the minimization problem in (fl31) for the optimal pair (a, /?) then 
at the very least one has to have that 

d(ay 2 - (1 -q)log(erf(y))) 

dy lv ~ Vext ' 

We proceed with checking whether (fTTT ) indeed holds. To that end we have: 

d(ay 2 - (I - a) log (erf (y))) I- a derfjy) ^ 

dy ly=y ^ ~ [ZOy erf(y) dy )ly=y ^ 



2aernnv(i-|) - (1 - fi^Le*-]^ 

V2aerfinv(i^) - (1 - p)^-^^ 
1 — p V vr 



(18) 



where the last equality follows by our assumption that (a, j3) are optimal and therefore satisfy (fTTT i. Essen- 
tially, (fT8l) shows that if (TTTT t is correct then ([TBI is correct as well. 

Combination of (fT3T > and (MoT) then gives us the following convenient characterization of the "external 
exponent" ^> ex t'. 

*e*t = «2/L -(!-«) log(erf( fct ))) = a (erfinv(i^)) 2 - (I - a) log(^). (19) 
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2.4 Net exponent 

In this section we combine the expressions for the "internal" and "external" exponents obtained in (fT4l i and 
( fl9l ), respectively, with the expression for the "combinatorial" exponent given in (f5]). Before proceeding fur- 
ther with this exponent combination we first slightly modify the expression for the combinatorial exponent 
given in ©. 

^com = (a-/3)log(2) + (l-/3)tf(^|) 

= (« - log(2) + (1 - 0(-^f M^f ) " (^f ) **(^f )) 
= (a-/3)log(2)-(a-/3)log(^|)-(l-a)log(^) (20) 
Plugging the results from (fl4l ). ([19) , and (1201 back in (0]) one has 

®net = *com - *tnt - ^ext 

= (a-?) log(2) - (a - /J) k>g(^|) - (1 - a) bg(^|) 

Z 1 — p Z 7T 

+ (a - /3) log(a) - (a - (3) log(a - /?) + (a - /3) log( v^erfinv(-^|)) + (a - 0) log(2)) 
- (a(erfinv(i^)) 2 -(l-a)log(^)). (21) 
After canceling all terms that can be canceled one finally has 

^net = 

= (a- /3)(log(^) + log(,/?) - log( Vzerfinv(^)) + log(e- (erfinv( ^f )} 
a V vr 1 — p 

,1-/3 |y e -(erfinv(i^)) 



(a - /3) log(^) + log(-) - (a - /3) log(V2erfinv \\-^)) - (a - /3)(erfinv(^:)) 2 

a 27r 1 — p 1 — p 



1-/3' 

1-/3 Ay e -( erfinv (T^)) 2 

(a-/3)log( -\\—^ ; ) 

V P/ 81 « Vf ^2erfinv(ief ) 7 

= (22) 

where the last equality follows by assumption (TTTT t. Since we obtained that $f ne t = and we never contra- 
dicted assumption (fTTT >. the assumption must be correct. To be completely rigorous one should add that if an 
a is given and j3 w is such that pair (a, f3 w ) satisfies (fTTTt then for any f3 < f3 w AC™ is centrally /3n-neighborly 
(i.e. one needs (3 to be strictly less than f} w because (O asserts that one actually needs ^ net < 0). 
We summarize the results from this section in the following theorem. 

Theorem 1. (Geometry-probability equivalence — General xj Let yl m (0) be anmxn ortho-projector (or 
an m x n matrix with the null-space uniformly distributed in the Grassmanian). Let k,m,n be large and 
let a = ^ and j3 w = \ be constants independent of m and n. Let erfinv be the inverse of the standard error 
function associated with zero-mean unit variance Gaussian random variable. Further, let a and f3 w be such 
that 

1-A» /2e"^W 



a V vr ^erfinv (jz 



1. (23) 



A. 
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Then with overwhelming probability polytope AC™ will be centrally fin-neighborly for any f3 < f w . 

Further, let the unknown x in (|7]) be k-sparse and let the location and signs of nonzero elements ofx be 
arbitrarily chosen but fixed. Then, as shown in H43\l , for any j3 < fi w one with overwhelming probability 
has that the solution of^jf is exactly the fin-sparse x in ([7]). 

Proof. Follows from the previous discussion through a combination of (fTTb . (l22l) . and the main results 

of ma. □ 

The results for the weak threshold obtained from the above theorem have been already plotted in ll43l 
and as it was mentioned in ll43l . they were in an excellent numerical agreement with the ones obtained 
in lfl5l[T6Tl (for the completeness we present the results again in Figure [2]). Finally, Theorem Q] rigorously 
establishes that the agreement is not only numerical but also analytical and that the weak thresholds obtained 
in lfl6l and fi3l are indeed exactly equal to each other. 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

a 



Figure 2: Weak threshold, l\ -optimization 



3 Nonnegative x 

3.1 Success of t\ and neighborliness of projected simplices 

In this section we consider a special case of (fl]). We will assume that the nonzero components of x in £0) 
are all of same sign (say they are all positive). If this is a priori known then instead of using © to recover 
the "nonnegative" x in (Q} one can use (see, e.g. Ill8[|19[l43l ) 

min ||x||i 
subject to Ax. = y 

Xi > 0, < i < n. (24) 
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Since so to say more structure is imposed on x (and this structure is made known to the system's solver) 
one would expect that the recoverable thresholds should be higher in this case than they were in the case 
of "general" sparse vectors x. As demonstrated in H181I191I431 the thresholds are indeed higher. Moreover, 
although the results of [19 ] and [43 ] were obtained through completely different approaches as demonstrated 
in Il43l they happened to be in an excellent numerical agreement. In this section we will rigorously show 
that the agreement is not only numerical but also algebraic/analytical. 

Before proceeding further we quickly recall on and appropriately modify the definition of the weak 
threshold. The definition of the weak threshold was already introduced in SectionQ] However, that definition 
was suited for the recovery of general vectors x considered in the previous section. Here, we slightly modify 
it so that it fits the scenario of a priori known sign patterns of nonzero elements of x. For any given constant 
a < 1 and any given x with a given fixed location of nonzero components and for which it is known that its 
nonzero components are say positive there will be a maximum allowable value of (3 such that (1241 ) finds that 
given x in (Q]) with overwhelming probability. We will refer to this maximum allowable value of f3 as the 
nonnegative weak threshold and will denote it by 

The story again starts from Donoho's classic 031 . In a follow-up Q9l Donoho and Tanner made a 
key observation that a majority of what was done in 031 and was related to the regular n-dimensional 
cross-polytope would continue to hold in a slightly modified way if translated to the standard n-dimensional 
simplex. In a more mundane language, in [19] Donoho and Tanner took again a geometric approach but this 
time to the performance analysis of l\ -optimization from (1241 and again managed to establish a connection 
between the performance analysis of (|24l and the concepts of polytope's neighborliness. This time the main 
recognition went along slightly different lines: 1) Let T n be the standard n-dimensional simplex and let 
AT 71 be the polytope one obtains after projecting T n by A; 2) Then the solution of (T24l will be exactly the 
nonnegative /c-sparse solution of 03 if and only if polytope AT n is fc-neighborly. For completeness we just 
briefly recall that a polytope is ^-neighborly if its every k + 1 vertices span its a k dimensional face. 

As in the previous section the above characterization then enables one to replace studying the success of 
(1241 in recovering the nonnegative sparse solution of an under-determined system by studying the neighbor- 
liness of the projected standard simplex. Of course, as earlier, it is not, a priori, clear that the latter problem 
is any easier then the former one. However, knowing the results of the previous section (and ultimately of 
course those of 031 ") one could now be tempted to believe that polytope type of characterization could be 
manageable. As discovered in QJQ , it turns out that the neighborliness of randomly projected simplices has 
been explored to some extent in the literature on the geometry of random high-dimensional polytopes. As 
in the previous section, using the "sum of angles" result from [2l[33]|4T|] it was established in [ 18 ] that if A 
is a random ortho-projector AT n will be /c-neighborly with overwhelming probability if 



where C+ m = [ n ^T k l ), Cf nt {T k ,T m ) is the internal angle at face T k of T m , C+ xt {T m ,T n ~ l ) is the 



external angle of T n ~ l at face T m , and T k , T m , and T n_1 are the standard k, m, and (n — 1) dimensional 
simplices, respectively. The authors in [18 | then proceeded by establishing that (1251 ) is equivalent to the 
following inequality related to the sum/difference of the exponents of C+ m , Cf nt , and C^ xt : 



n- l \og{C+ m Ct nt {T k X 



fin 



)C+,(F m ,T™))<0 



(25) 



net 



coin 





(26) 
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where 



*+ m = n- 1 log(C+J = (l-/3)F( T -|) 
*+ t = n- 1 log(C+,(T fc ,r m )) 

*+ 4 = n- 1 log(C+,(T m ,T"- 1 )) (27) 

and as earlier .ff(p) = — plog(p) — (1 — p)log(l — p) is the standard entropy function and log = 
e nH{p) j s standard approximation of the binomial factor by the entropy function in the limit of n — > oo. 
The rest of the approach from [18| is the analysis of the closed form expressions for Cf nt {T k ,T m ) and 
C^ xt (T m , T n ~ l ) obtained/analyzed in various forms in Il6l l37ll38l . In the following two subsections we will 
separately consider results Donoho and Tanner established for the internal and the external angle exponents. 
Relying on the insights from ll43l we will provide convenient characterizations of the exponents that will 
eventually help us establish the rigorous equivalence of results from [ 18 ] and [43 ]. 



3.2 Internal angle — nonnegative x 

Just by simply looking at formulas (0]) and (l26l > one can hardly see any difference between the definition of 
the external angle that we in this section and the one that we had in the previous section. The definitions 
are indeed the sam and the angle's exponents are indeed the same. However, characterizations that we will 
provide will differ. 

To that end we recall on the procedure for determining the exponent of the internal angle ^f nt (the 
procedure is of course the same as the one from the previous section and is ultimately the one introduced 
in lfT6lO . As earlier, let 7 = ^ and for s > let 

$(s) = — = / e 2 dx 

Js 

1 _* 2 



2 



2vr 



(28) 



Then one has 



where 



*&*G8, «) = («- /3)£+(y 7 ) + («-/?) log(2) (29) 



7 + 

1 — 7 ' 



£ 7 + (y 7 + ) = -V + ) 2 ^-^log(-) + log(^) (30) 
2 ' 7 2 7r 7 



*(,) = (1_ 7 )W_ (31) 
s 



and s+ > is the solution of 



As earlier, if one can determine s+ then a combination of (1291) and (1301) would give a convenient closed form 
expression for the exponent ^f nt (/3, a). Finding s+ is equivalent to solving (f3TT > over s which for a generic 
7 seems to be possibly only numerically. As we have done in the previous section, we will at this point 
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make again a guess and say that t + = — 1) and 

s+ = \/2erfinv(t + ) = \/2erfinv(2^— ^ - 1) (32) 

1 p 

where as earlier erfinv(-) is the inverse of the error function erf(-) associated with the standard normal 
random variable (i.e erfinv(-) is the inverse of erf (r) = -J= f Q r e~ q dq). Of course, as it was the case earlier, 

s+ from (1321 will not be the solution of (l3Tb for every 7 = ~- However, we will again hope that it may 

be the solution of (I3TI ) for the optimal 7 = ^ ML , i.e for the one for which the net exponent, \E'^ et , in (1261 ) is 
zero (again, strictly speaking, instead of "zero" we should say "smaller than — e where e > is arbitrarily 
small"). This hope does not seem any more likely to succeed than the one we made in the previous section 
unless one is aware of the results of [43 1 . 

As it was the case in the previous section, we proceed by trying to confirm that our guess is actually 
right. We again start by noting that $(s) = |(1 — erf(^)). If d3"2l is to be correct then to satisfy (|3TT > one 
must have 

or in a more convenient algebraic form 

- — \ — ^ 1 = L (33) 

a V 2vr V2erfinv(2i5| - 1) 

Our logic from the previous section still remains in place. Namely, if it eventually turns out that for a and 
j3 for which (l33l holds one also has that ^^et i n © * s zero trien we cou ld claim that the guess we made for 
s+ in (1321 is actually correct. 

We now proceed with the evaluation of the "internal exponent" ^f nt assuming that both (l32l and (l33l) 
are correct. Plugging (l32l back in <l30l > we obtain 

y+ = 7^-4 = ^-,^erfinv(2i^ - 1). (34) 

' 1 — 7 ' a — p 1 — p 

Further combination of (l30l > and (l34l gives 



= -^(y 7 + ) 2L ^-^og(-) + iog(^) 

' ' 2 ' 7 2 7T 7 

- -^^-^(V2erfinv(2^ - l)) 2 - ~ log(-) + log(-^) + log(V2erfinv(2^ - 1)). 
2 a — p 1 — p 2 it a — p 1 — p 

(35) 

Finally plugging Cyivt) computed in (|35T ) back in (l29l) we have for the exponent of the internal angle 

*+ t = -I/3(V2erfinv(2^ - l)) 2 - \ og (l) + ( a - /3) log(a) 

- (a - /3) log(a -/?) + («-/?) log(V2erfinv(2i^ - 1)) + (a - /3) log(2). (36) 

1 - /3 



11 



3.3 External angle 

In this subsection we provide the external angle counterparts to results provided in the previous subsection 
for the internal angle. In |[T8l Donoho and Tanner established that the exponent of the external angle can be 
computed in the following way 

*+tC9,a) = min(ay 2 - (1 - a) log(J(l + erffo)))). (37) 

It was also shown in lfT8l that function (^(1 + erf (y))) is smooth and convex. As earlier, solving analytically 
the above minimization does not appear to be an easy task for a generic fixed a However, we will again 
take a guess and assume that the solution of the above minimization is 

y+ t = afinv(2i^-l). (38) 

It is again of course unreasonable to expect that this choice of y would be the solution of the minimization 
problem in (1371 ) for every given a. However, we do hope that it could be the solution for the optimal pair 
(a, (3) (as stated above, the optimal pair (a, (3) is the one that makes the net exponent Vl/+ et * n *EH> equal to 
zero). If y£ xt defined above is to be the solution of the minimization problem in (l37l) for the optimal pair 
(a, (3) then at the very least one has to have that 

d{ay 2_ {1 _ a)log{ l {1 + ed{ym 

+ = U. (39) 

ay y-Vext 

To check whether (|39l) holds or not we write: 



d(ay 2 -(l-a)log(±(l + er{(y)))) 1-a (fcrf(y) 

dy ] y=y^t 1 ay l(l + erf(y)) 2dy )l y=yt*t 

= 2aerfinv(2i^-l)-(l- /3 )^e- 2 Uit 



= V2(V2aerfinv( 2 i^ - 1) - (1 - /^/Xe-^nv^-i))^ = Q (4Q) 

1 — p V 27T 

where the last equality follows by our assumption that (a, /3) are optimal and therefore satisfy (l33l . Since, 
(|40T > shows that (l39l) indeed holds one then has that if (l33l) is correct then (I38T ) is correct as well. 

Combination of (I37T ) and (l38l) then gives us the following convenient characterization of the "external 
exponent" #+ t : 

*it = «(24,) 2 - (1 - a) log(i(l + erf(y+ 1 )))) = a(errinv(2i^ - l)) 2 -(1-a) M^)- (41) 



3.4 Net exponent 



In this section we combine the expressions for the "internal" and "external" exponents obtained in (l36l i and 
(|4TT) . respectively, with the expression for the "combinatorial" exponent given in d27l . Before proceeding 
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further we first slightly modify the expression for the combinatorial exponent given in (l27l i. 



*+ m = (l-/3)2f(£L-|) 



a-/3 ,«-/?, ,1-a,, 

(i -^)(-Tzrg - (t^) 1 ^!^)) 

-(a - /?) Iog(j^|) - (1 - a) bg(j^|) (42) 



Plugging the results from (136T ). (|4TT i. and (|42l back in (l26l i one has 

^net ^ com ^int ^ ext 

= -(a-/3)log(^|)-(l-a)log(^) 

- (-I/3(V2erfinv(2i^ - I)) 2 - l og (l) 

+ (« - 0) log(a) - (a - 0) log(a - /?) + (a - /?) l og (V2erfinv(2^| - 1)) + (a - /3) log(2)) 

- ( a (erfinv(2^ - l)) 2 - (1 - a) log(^|)). (43) 
After canceling all terms that can be canceled one finally has 

= (a - /3)(log(i^) + log(/l) - log(^erfinv(2i^ - 1)) + log(e~ (erfinv(2 ^f ~ 1))2 ) 
a V 27T 1 — p 



l_fl rY e -(erfinv(2l5f-i))2 
(a-0)log' P ' 



a 




2vr^/2erfinv(2^ 



(44) 



where the last equality follows by assumption (1331 . Since we obtained that ^„ e< = and we never con- 
tradicted assumption (l33l . following the logic presented in the previous section, the assumption must be 
correct. Again, to be completely rigorous one should add that if an a is given and /?+ is such that pair 
(a, satisfies (l33l then for any AT n is /3n-neighborly (i.e. one needs /3 to be strictly less than 

/3+ because (l26l) asserts that one actually needs x I/^ ef < 0). 

We summarize the results from this section in the following theorem. 

Theorem 2. (Geometry-probability equivalence — Nonnegative x) Let A in ([7]) be an mxn ortho-projector 
(or an m x n matrix with the null-space uniformly distributed in the Grassmanian). Let k,m,n be large 
and let a = ^ and /3+ = ^ be constants independent of m and n. Let erfaiv be the inverse of the standard 
error function associated with zero-mean unit variance Gaussian random variable. Further, let a and /?+ 
be such that 

, -(erfinv(2^^-i)) 2 

- — — a/ — — = = = 1. (45) 

a V 2vr ^/2 e rfinv(2-^ - 1) 

Then with overwhelming probability poly tope AT n will be fin-neighborly for any (3 < /3+. 

Further, let the unknown x in (0) be k-sparse and nonnegative and let the location of nonzero compo- 
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nents ofx be arbitrarily chosen but fixed. Then, as shown in H43V , for any (3 < /?+ one with overwhelming 
probability has that the solution of \24\ is exactly the nonnegative fin-sparse x in ([7]). 

Proof. Follows from the previous discussion through a combination of (l33T ). (l44l . and the main results 

of ma. □ 

The results for the weak threshold obtained from the above theorem have been already plotted in ||43l 
and as it was mentioned in (43], they were in an excellent numerical agreement with the ones obtained 
in Ifl8l[l9l (for the completeness we present the results again in Figure [3]>. Finally, Theorem [2] rigorously 
establishes that the agreement is not only numerical but also analytical and that the weak thresholds obtained 
in lfl"8l and ll43l are indeed exactly equal to each other. 



Weak threshold, I -optimization, signed x 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

a 



Figure 3: Weak threshold, l\ -optimization; signed x 



4 Discussion 

In this paper we considered under-determined systems of linear equations with sparse solutions. We focused 
on solving such systems via a classical polynomial-time l\ -optimization algorithm. We also focused on 
random systems, i.e. on systems where the system matrix is random. 

Two different approaches, the geometric one from [16 | and the probabilistic one from B3l . were con- 
sidered. These approaches were known to provide characterizations of t\ -optimization success that are in 
excellent numerical agreement. Here we provided a rigorous proof that the recovery thresholds that one can 
obtain through one of these approaches are exactly the same as the ones that can be obtained through the 
other. 

We also showed that this remains true when one restricts to under-determined systems with nonnegative 
and sparse solutions. Namely, we rigorously showed that the nonnegative recovery thresholds one can obtain 
through either of approaches [18] and ll43l are exactly the same as the ones that can be obtained through the 
other. 
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An interesting bonus is the following connection with the recent works J5][T7). Namely, in iflTl a 
belief propagation type of algorithm is put forth as a faster alternative to the standard l\ -optimization. Its 
performance was analyzed through a state evolution formalism (which was later made rigorous in [5 |) and 
the recovery thresholds were computed. Moreover, it was shown in [17] that these thresholds are the same 
as those computed in |43l . The result of this paper confirms that the sparsity recovery abilities of the belief 
propagation algorithm from [17] are exactly the same not only sa those from [43 ] but also as those from |[T6l 
and |[T8l (an overwhelming numerical evidence of this was of course already presented in lUTI ). 
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